0.01098
0.09016
Lecture
September 13, 2023
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
\[ \Pr \left\{ \theta | y \right\} = \frac{\Pr \left\{ y | \theta \right\} \Pr \left\{ \theta \right\}}{\Pr \left\{ y \right\}} \]## Application: rare disease {.smaller .scrollable}
Define \(y\) is getting a positive test result and \(\theta\) is having the underlying condition. Not that we do not observe \(\theta\) directly! Here \(y=1\) and we want to know \(\Pr\left\{\theta = 1 \mid y=1 \right\}\).
Likelihood:
| \(\Pr\left\{y = 1 \ldots\right.\) | \(\Pr\left\{y = 0 |\ldots \right.\) | |
|---|---|---|
| \(\left. ...\theta=1 \right\}\) | 0.99 | 0.01 |
| \(\left. ...\theta=0\right\}\) | 0.01 | 0.99 |
. . . A naive application of maximum likelihood: \(\Pr\left\{y=1 \mid \theta=1 \right\} > \Pr\left\{y=1 \mid \theta=0 \right\}\) so best estimate is \(\theta=1\)
We are studying \(\Pr\left\{\theta = 1 | y = 1 \right\}\).
0.01098
0.09016
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
\[ p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} \]
If we are drawing samples from a distribution, we can calculate up to a constant of proportionality and – since \(p(y)\) doesn’t depend on \(\theta\) – we can usually ignore it.
\[ \overbrace{p(\theta \mid y)}^\rm{posterior} \propto \underbrace{p(y \mid \theta)}_\rm{likelihood} \overbrace{p(\theta)}^\rm{prior} \]## Coin flipping
We flip a coin a few times. We want to estimate the probability of heads so that we can make well-calibrated bets on future coin tosses.
8
Maximum likelihood estimate (MLE) is the most likely value of \(\theta\) given the data. As before, we can use our log-likelihood.
n_heads / length(coin_flips)We should be suspicious of our analysis when it concludes that we will continue to see 8 out of 9 flips coming up heads forever.
To perform a Bayesian analysis, we’ll need a prior. A Beta distribution is a natural choice for a prior on a probability, although we could use a Uniform distribution or even something silly like a truncated Gamma (don’t!)
Cool property: if you have a Beta prior and a Binomial likelihood, the posterior is also Beta distributed. Look up Beta-Binomial conjugacy for more! We will leverage this property to check our answers.
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
See the very good Wikipedia article
p_accept gets very small)Modern samplers leverage gradients and clever tricks to draw better samples for harder problems. Let’s use them!
We can write down the full Bayesian model in Turing, which uses a syntax very close to our notation!
We can leverage sophisticated machinery for drawing samples from arbitrary posterior distributions. For now, we will trust that it is drawing samples from \(p(y | \theta)\) and not worry about the details.
Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ θ 0.6832 0.1021 0.0016 4234.2533 5796.1544 1.0003 ⋯ 1 column omitted
We can visualize our posterior
histogram( coin_chain[:θ]; label=“Samples”, normalize=:pdf, legend=:topleft, xlabel=L”\(θ\)“, ylabel=L”\(p(θ | y)\)” ) plot!(closed_form; label=“Exact Posterior”, linewidth=3) plot!(prior_dist; label=“Prior”, linewidth=3) vline!([θ_mle]; label=“MLE”, linewidth=3) ```## Compromise
The posterior is a compromise between the prior and the likelihood.
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
Define a LogNormal distribution with very diffuse (flat) priors
We leverage the histogram2d function to visualize the 2D posterior distribution.
Each draw from the posterior represents a plausible value of \(\mu\) and \(\sigma\). We can use these to explore the distribution of return periods.
Visualize the samples as a chain
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
We can treat the priors as parameters so that we don’t have to define a new @model each time we want to update our priors
Define priors
Draw samples from the prior
Plot the consequences of these samples
If we are getting return levels of \(10^{12}\) ft, we should probably revise our priors
We can sample
We use the same model to get the posterior. Often we want to run multiple chains with different initial values to make sure we are getting good samples.
Summary Statistics parameters mean std mcse ess_bulk ess_tail rhat ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ μ 1.3692 0.0194 0.0001 17256.1692 13201.9364 1.0003 ⋯ σ 0.1861 0.0138 0.0001 17431.4216 13529.7674 1.0001 ⋯ 1 column omitted
Note
Here our likelihood is very informative, so it doesn’t much matter if we use excessively diffuse priors. This is nice, though not something we can count on in general.
As before, we can visualize our posterior distribution in terms of return periods
Today
Prior information
Bayesian Inference
Sampling
Example: storm surge distribution
Alternative priors
Wrapup
The official docs are great.
Warning
Google will often try to link you to the old site, https://turing.ml/. This is out of date! Use https://turinglang.org/stable/ instead.
Do not just plug in the problem and paste the solution!
Do use it to for syntax help and code explanations